DeepSea: Progressive Workload-Aware Partitioning of Materialized Views in Scalable Data Analytics
نویسندگان
چکیده
Selective materialization of intermediate query results as views is an effective method for improving query performance. In this paper, we extend this technique to adaptively partition views based on the access patterns of a workload. That is, we collect information about the selection conditions of queries at runtime and utilize this information to determine fragment boundaries for the initial partitioning when materializing a view. Furthermore, we refine view partitions over time based on the selection conditions of incoming queries. We present a novel cost-benefit model for partitioned views, as well as a candidate view and fragment selection approach both of which exploit the nature of partitioned views by taking the correlation among view fragments into account. Furthermore, we present DeepSea, an implementation of these techniques built on top of Hive. Our experimental evaluation demonstrates the effectiveness of partitioned views, improving performance by up to an order of magnitude compared to state-of-the-art approaches.
منابع مشابه
Cost-aware view materialization for highly distributed datasets
Querying large datasets distributed over thousands of endsystems is a challenge for existing distributed querying infrastructures. High data availability requires either replicating or centralizing the dataset but both require infeasibly high network bandwidth. In-situ querying provides low bandwidth overheads but requires users to tolerate low data availability. This paper advocates partial da...
متن کاملSweet KIWI: Statistics-Driven OLAP Acceleration using Query Column Sets
KIWI is a SQL-on-Hadoop system enabling batch and interactive analytics for big data. In database systems, materialized views, stored pre-computed results for queries, are one of the most commonly used techniques to improve the query processing speed. However, the key challenge in using materialized views is maintaining their freshness as base data changes. This paper introduces a new approach ...
متن کاملBringing Together Partitioning, Materialized Views and Indexes to Optimize Performance of Relational Data Warehouses
There has been a lot of work to optimize the performance of relational data warehouses. Three major techniques can be used for this objective : enhanced index schemes (join indexes, bitmap indexes), materialized views, and data partitioning. The existing research prototypes or products use materialized views alone or indexes alone or combination of them, but none of the prototypes use all three...
متن کاملDynamic Construction and Administration of the Workload Graph for Materialized Views Selection
To offer the best performance to each application (e.g., decision support systems) administration and optimization of database management systems is needed. In order to reduce administration costs and to provide continuous adaptation to changing workload patterns, selfmanagement techniques have been in the focus of researchers and DBMS vendors for recent years. One important topic in this area ...
متن کاملCoupling Materialized View Selection to Multi Query Optimization: Hyper Graph Approach
Materialized views are queries whose results are stored and maintained in order to facilitate access to data in their underlying base tables of extremely large databases. Selecting the best materialized views for a given query workload is a hard problem. Studies on view selection have considered sharing common sub expressions and other multi-query optimization techniques. Multi-Query Optimizati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017